Skip to content

ARROW-6317: [JS] Implement IPC message format alignment changes#5225

Closed
trxcllnt wants to merge 1 commit into
apache:ARROW-6313-flatbuffer-alignmentfrom
trxcllnt:ARROW-6314
Closed

ARROW-6317: [JS] Implement IPC message format alignment changes#5225
trxcllnt wants to merge 1 commit into
apache:ARROW-6313-flatbuffer-alignmentfrom
trxcllnt:ARROW-6314

Conversation

@trxcllnt

Copy link
Copy Markdown
Contributor

Implements the IPC message format alignment changes for ARROW-6313. In this PR the MessageReader can read messages with the old alignment, but RecordBatchWriter always produces messages with the new alignment. I can add a flag if others think it'd be useful to produce messages in the old format.

@trxcllnt

Copy link
Copy Markdown
Contributor Author

@wesm C++ <-> JS integration tests are passing:

@emkornfield

Copy link
Copy Markdown
Contributor

@TheNeuralBit do you have time to review?

@TheNeuralBit TheNeuralBit left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments. Thanks Paul!

Comment thread js/src/ipc/message.ts Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some kind of assertion somewhere that verifies length > 0? What happens if a corrupt file has that value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s effectively handled in the subsequent call. We use the length read here as the number of bytes to consume from the stream, which should be the flatbuffer metadata.

If the length is incorrect, the flatbuffer metadata created from the buffer will be empty. If the length is too long, the stream will read as many bytes as it can, and close automatically. Both cases are interpreted by the stream reader as bad input, and ignored.

Comment thread js/src/ipc/message.ts Outdated
Comment thread js/src/ipc/message.ts Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too bad we have to duplicate this code. Could the readers be combined somehow? Not in this PR, just curious

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not. Async functions are implemented via promises, which don’t allow parameterizing the scheduler. Async functions force the caller to handle the returned promise, guaranteeing the function doesn’t finish its work synchronously, even if it could.

This is a key difference between JS promises and Python’s asyncio module, which does execute synchronously if no actual async work is performed.

@trxcllnt

trxcllnt commented Sep 4, 2019

Copy link
Copy Markdown
Contributor Author

@TheNeuralBit just pushed up some changes to add a writeLegacyIpcFormat flag to the RecordBatchStreamWriter options and clarify the comment in the reader.

@fsaintjacques

Copy link
Copy Markdown
Contributor

Is this ready for merging?

@trxcllnt

trxcllnt commented Sep 9, 2019

Copy link
Copy Markdown
Contributor Author

@fsaintjacques I believe so yes

@fsaintjacques

Copy link
Copy Markdown
Contributor

According to the mailing list it seems that we need to change the EOS marker size too. I'll wait for this to settle first.

@wesm

wesm commented Sep 9, 2019

Copy link
Copy Markdown
Member

I'll make a Format patch today with the changes and change the C++ library, the change shouldn't be too complex

@trxcllnt

trxcllnt commented Sep 9, 2019

Copy link
Copy Markdown
Contributor Author

I'm testing out the new EOS bytes change in JS right now, will push a commit here in a bit if it works.

@wesm

wesm commented Sep 9, 2019

Copy link
Copy Markdown
Member

Awesome, thanks!

@wesm wesm force-pushed the ARROW-6313-flatbuffer-alignment branch from 4f9b887 to 0352456 Compare September 11, 2019 22:07
@trxcllnt

Copy link
Copy Markdown
Contributor Author

@wesm have the C++ changes for the new 8-byte eos message been pushed to the integration branch? If so, I can pull locally and validate against the JS changes.

@wesm

wesm commented Sep 12, 2019

Copy link
Copy Markdown
Member

@trxcllnt I just pushed the change to the branch, so you can rebase and test

@wesm

wesm commented Sep 12, 2019

Copy link
Copy Markdown
Member

I just rebased

@wesm

wesm commented Sep 12, 2019

Copy link
Copy Markdown
Member

It looks like only the EOS writing is probably all that needs to be changed

@trxcllnt

Copy link
Copy Markdown
Contributor Author

@wesm I pulled the latest from this branch and rebuilt everything, but getting errors in the integration runner:

C++ producing, C++ consuming
==========================================================
Testing file /home/ptaylor/dev/arrow/integration/data/struct_example.json
==========================================================
-- Creating binary inputs
-- Validating file
-- Validating stream
Segmentation fault
Traceback (most recent call last):
  File "integration/integration_test.py", line 1284, in _compare_implementations
    run_binaries(producer, consumer, test_case)
  File "integration/integration_test.py", line 1315, in _produce_consume
    consumer.stream_to_file(producer_stream_path, consumer_file_path)
  File "integration/integration_test.py", line 1554, in stream_to_file
    self.run_shell_command(cmd)
  File "integration/integration_test.py", line 1389, in run_shell_command
    subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat /tmp/tmpbp24i1lj/4ee47ec3_struct_example.producer_file_as_stream | /home/ptaylor/dev/arrow/cpp/build/debug/arrow-stream-to-file > /tmp/tmpbp24i1lj/4ee47ec3_struct_example.consumer_stream_as_file' returned non-zero exit status 139.
JS producing, C++ consuming
==========================================================
Testing file /home/ptaylor/dev/arrow/integration/data/struct_example.json
==========================================================
-- Creating binary inputs
-- Validating file
-- Validating stream
Segmentation fault
Traceback (most recent call last):
  File "integration/integration_test.py", line 1284, in _compare_implementations
    run_binaries(producer, consumer, test_case)
  File "integration/integration_test.py", line 1315, in _produce_consume
    consumer.stream_to_file(producer_stream_path, consumer_file_path)
  File "integration/integration_test.py", line 1554, in stream_to_file
    self.run_shell_command(cmd)
  File "integration/integration_test.py", line 1389, in run_shell_command
    subprocess.check_call(cmd, shell=True)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'cat /tmp/tmpbp24i1lj/5f87e545_struct_example.producer_file_as_stream | /home/ptaylor/dev/arrow/cpp/build/debug/arrow-stream-to-file > /tmp/tmpbp24i1lj/5f87e545_struct_example.consumer_stream_as_file' returned non-zero exit status 139.

@trxcllnt

Copy link
Copy Markdown
Contributor Author

@wesm nevermind, looks like it was a bad rebase. I just re-branched from ARROW-6313-flatbuffer-alignment and copied over only the JS changes and the integration tests are passing locally.

@wesm wesm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Awesome, thanks.

@wesm wesm closed this Sep 12, 2019
wesm pushed a commit that referenced this pull request Sep 13, 2019
Implements the IPC message format alignment changes for [ARROW-6313](https://issues.apache.org/jira/browse/ARROW-6313). In this PR the `MessageReader` can read messages with the old alignment, but `RecordBatchWriter` always produces messages with the new alignment. I can add a flag if others think it'd be useful to produce messages in the old format.

Closes #5225 from trxcllnt/ARROW-6314 and squashes the following commits:

b69bc0e <ptaylor> update ipc reader and writer for ARROW-6313

Authored-by: ptaylor <paul.e.taylor@me.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
pprudhvi pushed a commit to pprudhvi/arrow that referenced this pull request Sep 16, 2019
Implements the IPC message format alignment changes for [ARROW-6313](https://issues.apache.org/jira/browse/ARROW-6313). In this PR the `MessageReader` can read messages with the old alignment, but `RecordBatchWriter` always produces messages with the new alignment. I can add a flag if others think it'd be useful to produce messages in the old format.

Closes apache#5225 from trxcllnt/ARROW-6314 and squashes the following commits:

b69bc0e <ptaylor> update ipc reader and writer for ARROW-6313

Authored-by: ptaylor <paul.e.taylor@me.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants